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Abstract 


REDUCING NONPOSTIVE DEFINITENESS 


Deciding which random effects to retain is a central decision in mixed effect models. Recent 
recommendations advise a maximal structure whereby all theoretically relevant random effects 
are retained. Nonetheless, including many random effects often leads to nonpositive definiteness. 
A typical remedy is to simplify the random effect structure by removing random effects or 
associated covariances. However, this practice is known to bias estimates of remaining 
covariance parameters and compromise fixed effect inferences. Cholesky decompositions 
frequently are suggested as an alternative and are automatically implemented in some software. 
Instead of Cholesky decompositions, we describe factor analytic structures as an approach to 
avoid nonpositive definiteness. This approach is occasionally employed in biosciences like plant 
breeding, but, ironically, has not been established in behavioral sciences despite the close 
historical connection with factor analysis in these fields. We discuss how a factor analytic 
structure facilitates estimation and conduct simulations to compare convergence and 
performance to simplifying the random effects structure or Cholesky decomposition approaches. 
Results show a lower rate of nonpositive definiteness with the factor analytic structure than 
Cholesky decomposition and suggest that factor analytic covariance structure may be useful to 


combating nonpositive definiteness, especially in models with many random effects. 
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Reducing Incidence of Nonpositive Definite Covariance Matrices in Mixed Effect Models 

Multilevel and mixed effects models in psychology have become mainstream for 
modeling data that have a clustered structure such that people are nested within organizational 
units (e.g., cross-sectional data such as students in schools) or longitudinal data where repeated 
measures are nested within people (e.g., Grimm et al., 2016; Hox et al., 2017). A benefit of these 
models is that random effects are included to allow the coefficients to be specific to each higher- 
level unit. In cross-sectionally clustered data, this means that the relation between a predictor and 
the outcome is allowed to differ across organizational units to capture attributes specific to that 
unit. In longitudinal data, this means that growth curves are person-specific to allow for 
variability in change over time. 

A key decision point in specifying these models is which random effects to include in the 
analysis. Barr et al. (2013) recommend to include the maximum number of relevant random 
effects and random effect correlations in models required to satisfy the theory, which has been 
referred to as maximal random effect structure. They argue that doing so maximizes the 
generalizability of studies across samples and can protect against adverse effects from possible 
model misspecifications. The advice from Barr et al. (2013) is sound theoretically, but Eager and 
Roy (2017) note that it may not always be pragmatic because random effect variances and 
correlations are difficult to estimate and models with multiple random effects are prone to 
nonconvergence or inadmissible solutions. Bates et al. (2018) go further stating that “the advice 
to ‘keep it maximal’ often creates hopelessly over-specified random effects because the number 
of correlation parameters to estimate rises quickly with the dimension of the random-effects 
vectors” (p. 18). Chief among these convergence issues is the dreaded nonpositive definite 


covariance matrix, which often manifests when a variance is estimated at or below 0 or when the 
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magnitude of the correlation between two random effects reaches or exceeds an absolute value of 
1. More generally, a matrix is considered positive definite only if all its eigenvalues are positive 
and considered semipositive definite if all eigenvalues are non-negative (Wothke, 1993). All 
covariance matrices are at least semipositive definite, so nonpositive definiteness indicates that 
the matrix cannot be reflective of the actual relations between variables (Pinheiro & Bates, 

1996). In mixed effects models, a nonpositive definite covariance matrix implies that the 
distribution of random effects is degenerate and that the dimensionality of the specified random 
effects may exceed what the data can support.! 

Barr et al. (2013) consider confirmatory research designs common in psycholinguistics 
and related cognitive sciences where most or all predictors are manipulated conditions, so the 
number of possible random effects is often a relatively small number and dictated by the 
experimental design. For studies situated further towards the exploratory end of the spectrum or 
those with observational designs, the statistical model is not necessarily dictated by the design 
and multiple models with different combinations of predictors are often considered. In such 
cases, there is not necessarily one maximal random effect structure. Alternatively, if a researcher 
tried to include every possible predictor with a random effect, there would be concerns about 
overfitting (Raudenbush & Bryk, 2002, p. 256). The convergence difficulties described by Eager 
and Roy (2017) are more influential with observational data because the final model can be 


unduly influenced simply by what will and will not converge during model building. 


' We are interested in the positive definiteness of the random effect covariance matrix itself here, but also note that 
the positive definiteness of the Hessian matrix can also be a problem (Gill & King, 2004). The Hessian matrix 
contains the second partial derivatives of the loglikelihood with respective to each parameter in the model and its 
negative inverse is used to obtain the standard errors in maximum likelihood estimation (e.g., Efron & Hinkley, 
1978). If the Hessian is nonpositive definite, it cannot be inverted and the standard errors are either untrustworthy or 
unreliable. Reasons for nonpositive definite Hessian matrices include redundancy between variables or parameters, 
an overparameterized model, or a saddle point on the likelihood surface (Cudeck, Klebe, & Henly, 1993). 
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The problem of nonpositive definite covariance matrices is exacerbated in the common 
context where the sample size at the higher-level is small or moderate (Hox & McNeish, 2020). 
For example, in the context of modeling curvilinear growth, Diallo et al. (2014) note that the 
variance of quadratic slopes across people is extremely difficult to estimate with convergence 
often being less than 50% across simulation conditions. The number of repeated measures is 
closely tied to convergence rates, which fell below 30% in their study with four repeated 
measures. When a maximal random effect structure is attempted but a nonpositive definite 
matrix is encountered, a pragmatic solution is to progressively simplify the random effect 
structure (Barr et al., 2013; Kiernan, 2018; Miyazaki & Frank, 2006). This typically involves 
removing random effects or the covariances among random effects until convergence is achieved 
and is a popular approach in empirical studies (Baird & Maxwell, 2016; Wang et al., 2019). 
Though this can aid in achieving convergence and ridding oneself of a nonpositive definite 
covariance matrix, it can lead to covariance structure misspecifications, which are known to bias 
covariance parameter estimates and affect Type-I error rates for fixed effects (Diggle et al., 2002; 
Ferron et al., 2002; Hoffman, 2015; Weiss, 2005), which are typically inflated but can also 
become deflated in some circumstances (Wang et al., 2019). For instance, the simulation in Barr 
et al. (2013) reported that the Type-I error rates for model omitting random slopes were between 
2 and 4 times the nominal rate, depending on conditions. As an example of the potential 
ramifications of removing random effects, Fisher, Hahn, deBruine, and Jones (2015) was 
retracted after the authors realized that including random slopes rendered the interaction effects 
of interest non-significant. Typically, removing random effects is neither substantively justified 
nor desirable. For instance, when fitting a quadratic growth model, a researcher might encounter 


a nonpositive definite covariance matrix wherein the quadratic random effect has non-zero 
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variance but correlates perfectly with the linear random effect. Should the researcher then 
remove this random effect because of the redundancy with the linear slope random effect? Doing 
so would imply that there are no individual differences in the curvature of the trajectories. This 
outcome is both at odds with the researcher’s original hypothesis that individual trajectories 
would differ in curvature and is inconsistent with the non-zero variance estimate obtained for the 
quadratic effect. Moreover, the researcher would be making the dubious assumption that the 
perfect correlation of these random effects reflects some truth about their underlying structure in 
the population, rather than simply estimation difficulties with the available sample data. 

In this paper, we propose that specifying a maximal random effect structure in a different 
way can greatly reduce the incidence of nonpositive definite covariance matrices, especially with 
longitudinal data containing few people or few repeated measures. In practice, a maximal 
random effect structure is parameterized almost exclusively via an unrestricted structure 
whereby each element in the covariance matrix is represented by a unique parameter that is 
explicitly estimated. Instead, we suggest using a factor analytic structure, which was first 
proposed by Jennrich and Schluchter (1986) and extended by Miyazaki and Frank (2006). The 
general idea of the factor analytic structure is to decompose a maximal structure into loadings 
and uniquenesses with a factor model (typically a one-factor model, but more factors are 
possible). That is, the covariance matrix is decomposed by factor analyzing the random effects 
within the same model. This permits retention of each element of a maximal random effect 
structure as in the unrestricted structure; however, in the factor analytic approach, each element 
is the product of other parameters rather than being a parameter that is directly estimated. 

Using a factor analytic structure over an unrestricted structure has received attention in 


biostatistics to address estimation issues, especially in plant breeding (Piepho, 1997, 1998) and 
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spatial or high-dimensional mixed effect models (Smith et al., 2001; Wang, Wang, Hedeker, & 
Chen, 2019). The benefit of the approach is that it is more numerically stable with larger 
numbers of random effects while providing the same information as the traditional unrestricted 
structure, often with fewer parameters (Meyer, 2009). However, despite use in biostatistics and 
inclusion in mainstream software like SPSS and SAS Proc Mixed, applications of factor analytic 
covariance structures are notably absent in psychology, which is somewhat ironic given 
psychology’s close historical connection with factor analysis. Indeed, the biostatistics literature 
on this topic references psychology’s influence in the development of factor analysis, yet 
psychologists remain mostly unaware of its potential advantages in addressing issues related to 
nonpositive definite random effect covariance matrices. 

The purpose of this paper is therefore to demonstrate the advantages of a factor analytic 
structure for the random effect covariance structure within mixed effects model applications in 
psychological research and to compare its performance to other existing methods. For expository 
purposes, we focus on the growth-modeling context, with the understanding that the benefits of 
the factor analytic structure extend to cross-sectional contexts as well. We start with a brief 
overview of growth modeling in the mixed effect model framework. We then provide a 
motivating empirical example where the estimated random effect covariance matrix is 
nonpositive definite, reporting the results obtained under the typical approach of assuaging 
nonpositive definiteness by simplifying the random effect structure. We follow with an overview 
of alternative ways to address convergence difficulties, with particular focus on how the factor 
analytic covariance structure approach can facilitate estimation and numerical stability of 
covariance matrices. We then reanalyze the data from our motivating example to show how the 


factor analytic method alleviates the nonpositive definiteness of the covariance matrix without 
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requiring any simplifications. We also note how several of the conclusions change when all 
random effects are retained in the model with this structure. To generalize this point, we conduct 
a simulation study to show how the factor analytic method reduces the rate of nonpositive 
definite matrices across multiple conditions and also improves statistical properties of estimates 
such as the relative bias of covariance parameters and Type-I error rates of fixed effects. We then 
show how the same principles apply to data with cross-sectional clustering using an empirical 
example where students are clustered within schools. We end with a discussion of the limitations 
and future directions. 
Overview of Growth Models 

Mixed effect models treat data univariately such that repeated measures are a single 
variable but are indexed by a second Time variable such that each repeated measure occupies a 
unique row and a single individual’s data spans multiple rows (i.e., the data are long). Time is 
then included as an explicit predictor in the model and growth is captured by a regression 
coefficient associated with Time. These coefficients can vary across people via random effects, 
meaning that each person has their own person-specific growth curve. 

In multilevel notation from Raudenbush and Bryk (2002), a basic unconditional linear 


growth model can be written as 


¥, = By + 8, Time, +e, 
Boi = Yoo + oi (1) 
Bs =71o + Uy; 


The first expression is a typical regression model where the outcome y for the ith person at the th 


time is equal to an intercept ( ,, ) plus the linear slope ( Z,,) times the fth value of Time plus a 
residual (e,,) for the ith person at time f. The difference between a standard single-level regression 


and a growth model is that the regression coefficients ( GZ ) have i subscripts, meaning that they 
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vary across people. Correspondingly, each person has a unique intercept and slope which 
produces person-specific growth trajectories. These person-specific intercepts and slopes are 
directly modeled in the second and third expressions in Equation 1. The person-specific intercept 


is equal to the average intercept across all people (7,, ) plus a person-specific random effect (1), ) 


which captures the deviation of the ith person’s intercept from the average intercept. The person- 


specific slope is equal to the average slope across all people (y,, ) plus a person-specific random 
effect (u,,) which captures the deviation of the ith person’s slope from the average slope. 


A common growth model applied in empirical studies is the quadratic polynomial growth 
model, whose popularity is derived from its ability to capture curvilinear growth while also being 
easy to specify due to the model being linear in the parameters (Grimm et al., 2016, p. 209). 
Though there are reservations about the quadratic polynomial model prioritizing ease of 
implementation over interpretability (Blozis, 2004; Cudeck, 1996; Cudeck & du Toit, 2002; 
Preacher & Hancock, 2015), we focus on the model here because it is a commonly used model 
whose convergence issues are well-appreciated. That is, the convergence issues we discuss in 
this paper appear in a variety of contexts such as cross-sectional multilevel models, cross- 
classified multilevel models, and longitudinal models and the quadratic growth model serves as a 
case in point rather than the sole interest. 


The unconditional quadratic polynomial growth model can be written as 


Yu = Bo; +B, Time, + f,,Time;, te, 
Bri = Yoo + Uo 
Bi = Mo +My; 
Bai = V9 + Ur; 


(2) 


After adding a second-order polynomial term for Time, the intercept is the predicted value at 


Time = 0, the linear slope is the instantaneous growth rate at Time = 0 (1.e., the slope of the 
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tangent line at the intercept), and the quadratic slope captures half the acceleration of the growth 
curve. Usually, the zero-point for Time is placed at the origin of the observation window so that 
the intercept represents initial status and the linear slope represents initial rate of change but 
other placements and interpretations are equally valid (Singer, 1998). For instance, as we will see 
in our demonstration, the zero-point can also be placed at the end of the observation window so 
that intercepts and linear slopes represent final status and final rates of change. The highest-order 
effect, the quadratic slope, is unaffected by this choice. 
The individual values of the random effects are not explicitly estimated but instead are 
assumed to come from a multivariate normal distribution with a zero mean vector and an 


estimated covariance matrix: u, ~ MVN(0,G). Researchers can choose a structure for the random 


effect covariance matrix G where the most common structure is unrestricted. For the model in 


Equation 2 containing three random effects, an unrestricted structure would be 


t, 
00 
G=|T Ty where the diagonal terms are variances and off-diagonal terms are 


To T 41 To 
covariances. All elements of the matrix are explicit parameters that are directly estimated without 
constraints, other than possible boundary constraints to keep variances from becoming negative. 
An alternative way to fit the unrestricted structure is to directly estimate the standard deviation of 


each random effect and the correlation between random effects and use these values to derive the 


2 
Ayo 
variances and covariances, G =| 9,,0,,0,, Ge where p designates correlations and 
2 
P9922 9% p0 P9579, Ay, 


@ represents standard deviations such that 0 = Vr : 


Importance of the Random Effect Covariance Matrix 
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In addition to being important for quantifying the variability in coefficients across people, 
proper specification of the random effect covariance matrix is an explicit assumption for proper 
inference in mixed effect models because it is directly related to estimation of the fixed effect 
standard errors. To show this, it is helpful to rewrite Equation 1 in general matrix notation from 
Laird and Ware (1982) such that 

y,=X,y+Zu, +e, (3) 
where y, is a vector of scores containing person i's repeated measures, X, is a design matrix for 
the fixed effects containing person i’s values for the predictor variables (e.g., 1, Time, and Time 
for quadratic growth), y is a vector of fixed effect coefficients, Z,is a design matrix for the 
random effects containing person 7’s values of variables whose effect varies across people, u, is a 
vector of person i's random effects, and e, is a vector of person i's residuals such that 
e, ~ MVN(0,R,). The residual covariance matrix is usually assumed to be diagonal, i.e., 

R, = diag (o:; Oa avn, ) , where o,, might be freely estimated or estimated subject to some 
constraint to model heteroskedasticity as a function of time with few parameters (e.g., 

o, =o" exp[@xTime, | ). In other situations, the residual covariance matrix is assumed to be 
homoskedastic, i.e., R, =o7I, . For closely spaced repeated measures, serial correlation 


structures could also be considered (Fitzmaurice et al., 2012). 
In a mixed effect model, the model-based estimate of the sampling variability of the fixed 
effect coefficients is calculated by 


Cov(i) =| XIX, (4) 


i=l 
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where V, is the estimated marginal covariance such that V, = Z,GZi +R, . The standard errors 


are then calculated by taking the square root of the diagonal elements of Equation 4. To the 
extent that the structure for the random effect covariance matrix G and the residual covariance 


matrix R, are incorrect, these misspecifications will permeate to the fixed effect standard errors 


and can yield incorrect p-values and lead to incorrect inferences (Laird & Ware, 1982). Wang et 
al. (2019) analytically show the impact that different types of misspecifications have on the fixed 
effect standard errors for different types of estimation methods. This represents one motivation 
for the maximal random effect structure: if relevant random effects are excluded or relations 
between random effects are not captured, researchers not only miss out on the ability to capture 
the variability in the coefficient across people but also increase their susceptibility to faulty 
inference for the fixed effects. Though corrected standard errors can be computed that are robust 
to covariance matrix misspecifications, these are less efficient than correctly modeling the 
covariance (Gurka et al., 2011), less effective with smaller samples (McNeish & Stapleton, 
2016), and cannot be used in conjunction with the more sophisticated degree of freedom methods 
that provide optimal inference (SAS Institute Inc., 2018b, p. 6551). 
Motivating Example 

Consider data from a randomized trial comparing treatments for schizophrenia that 
originally appeared in Davis (2002). The study contains 40 male patients who are randomly 
assigned to one of two treatment groups such that each group has 20 patients. The Brief 
Psychiatric Rating Scale (BPRS) is administered to these patients to assess severity of 18 
symptoms such has hostility, suspiciousness, hallucination, and grandiosity. Each symptom is 
scored on a |- to 7-point scale where higher numbers indicate higher severity. The score on all 


18 symptoms are then summed to yield a total score with a possible range of 18 to 126. The 
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BPRS is administered at baseline and then at weekly intervals for eight weeks, yielding 9 total 
measures. The data are time-structured such that all patients have the same number of 
measurement occasions. The goal of the analysis is (a) to determine if patients are still improving 
at the conclusion of the study and (b) to determine whether there is a difference in the patients’ 
improvement between the two treatment groups. 

Figure | shows the empirical trajectory plots for all 40 men by treatment group. The plots 
show a clear curvilinear relation with the variance appearing to decrease over time. Based on this 
information, we fit a model with a quadratic polynomial term for Time and used a heterogeneous 


residual structure to reflect to decreasing variability over time. Specifically, the model fit to the 


data was 
BPRS, = Poi + f,Time, + ,,Time; +e, 
Boi =o + Vol reat, + Ug; (5) 
Pi =o + 7, Treat, +; 
Po; = Vo + YT reat; + Uy, 
where 
0 Oy. 
u; ~ MVN || 0}, Pr, Poo 0, (6) 
0} | PrP P2192, G3 
and 


e,~N (0, a exp| aime, 1) (7) 


Equation 5 shows that BPRS scores are a function of an intercept, linear slope, and quadratic 
slope. Time was coded so that the last time-point was equal to 0 and the first time-point was -8 in 
order to directly explore behavior of BPRS scores at the conclusion of the study (e.g., the 


intercept corresponds to the average difference at the end of the study, rather than at the 
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beginning). The equations for each of these coefficients includes a fixed effect, a treatment 
effect, and a random effect to allow the coefficient to be person-specific. The random effects are 
assumed to be normally distributed with a zero mean vector and an unrestricted covariance 
matrix. In this example and throughout this paper, we focus on parameterizing the unrestricted 
structure using standard deviations and correlations rather than variance and covariances to 
facilitate identifying whether off-diagonal elements are out of bounds (Stroup et al., 2018). The 
residuals are assumed to be normally distributed with a heterogeneous variance that changes as a 
function of time.” 

[Figure | about here] 

We fit the model in SAS Proc Mixed with restricted maximum likelihood estimation and 
Kenward-Rogers degrees of freedom (Kenward & Roger, 1997) to best account for the smaller 
sample size (McNeish, 2017). Estimation was conducted with SAS default options which include 
50 iterations each with a maximum of 150 optimization calls per iteration and convergence 
determined by a relative change in the Hessian matrix being less than le-8. Boundary constraints 
are also included by default to prevent variances from being below 0 and to prevent the 
magnitudes of correlations from exceeding 1. The results showed that the random effect 
covariance matrix was nonpositive definite. Specifically, the linear slope variance was estimated 
to be 0, which led to undefined correlations between the intercept and linear slope and between 


the linear slope and quadratic slope. Following the common troubleshooting approach, we 


? We began with a heterogeneous diagonal structure for the residuals, but because there are a fairly large number of 
repeated measures, this structure required 9 parameters and was not parsimonious. The structure shown in Equation 
7 accomplishes the same goal with only 2 parameters and was more parsimonious based on the BIC (2462.0 vs. 
2460.0). 


3 This result is unaffected by parameterizing the unrestricted structure with variances and covariances or standard 
deviations and correlations. We also fit the model with a homoskedastic residual structure to confirm that the issue 
was not attributable to the heterogenous variance structure, and the issue remained. 
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simplified the random effect covariance matrix by removing the quadratic random effect (the 
highest-order term). 

The average fitted curve for each group is shown in Figure 2 and the model estimates 
using the simplified random effect structure are shown in Table 1. To answer the first research 
question about differences between groups after 8 weeks, the fixed effect of treatment is not 
significant (7 =5.68,t (39.8) =1.59, p =.12), indicating that the means of the groups are about 
the same at the end of the study. To answer the second research question about the rate of change 
at the end of treatment, the results show that the instantaneous slope of Treatment | is negative at 
8 weeks after baseline but the p-value is just above the threshold for statistical significance ( 


y =—1.15,t(173) = -1.93, p =.05 ). The Treatment Group 2 instantaneous slope at Week 8 is 
positive and statistically significant (v =1.35, t(173) = 2.26, p =.03), meaning that Treatment 2 


appears to have worn off and that scores are increasing at the conclusion of the study. The 
difference between the instantaneous slopes across Treatment Groups is significant 


(¢= 2.50,t (173) = 2.96, p <.01). The quadratic slope was positive and statistically significant 
both for Treatment | (v= 0.20,t(278) = 2.28, p <.01) and Treatment 2 
(vy =0.41, t(278) = 5.88, p <.01) and there was a significant difference between the quadratic 


slopes such that the growth trajectory for Treatment 1 showed more significantly more curvature 


(y= 0.21,1(278) = 2.16, p =.03). Overall, we conclude that Treatment | is no longer 


significantly improving at the conclusion of the study whereas Treatment 2 has reversed course 
and BPRS scores are increasing at the conclusion of the study, which is attributable to the 
significantly greater bend in the growth trajectory. 


[Table 1 about here] 
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[Figure 2 about here] 
Approaches to Improve Convergence 

The previous section noted showed an example of how convergence difficulties can 
prompt a researcher to respecify their model from what was originally intended. While 
simplification of the random effects structure is typical, several alternative approaches to model 
re-specification are also common to varying degrees. We review these here prior to exploring our 
preferred solution — the factor analytic structure — in greater detail. 
Simple Rescaling 

One potential cause of estimation difficulties can be differences in the scales of the 
predictors. For instance, when higher-order polynomials are added to growth models, the values 
of the second-order Time” variable have a wider range than the first-order Time variable. For 
instance, in the BPRS data, the fixed effects are in units of Time and Time’, which range from -8 
to 0 and 0 to 64, respectively. In turn, the variance and covariance parameters associated with the 
random effects of Time and Time? are likely to be quite different in magnitude, and this makes 
estimation difficult (Cheng et al., 2010). Thus, rescaling predictors to have more similar ranges is 
one way to potentially improve convergence to a proper solution. For instance, in the BPRS 
data, we could have rescaled Time to range from -1 to 0 such that a one-unit change corresponds 
to the entire window of observations. In turn, Time? would range from 0 to 1. This change of 
scale might help to stabilize estimation, improving the likelihood of convergence; however, it is 
less likely to help to avoid nonpositive definite solutions. 
Centering 

Centering Time by subtracting its mean is one specific way to rescale the predictors to 


prevent higher-order polynomial terms from growing much more quickly than lower-order terms 
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(i.e., the range of Time? would only be 0 to 20.25 instead of 0 to 64 if Time were mean-centered; 
Schuster & von Eye, 1998). Unlike many other potential rescaling options, centering also has the 
advantage that it greatly reduces collinearity between polynomial terms or removes it entirely in 
time-structured data (Hedeker & Gibbons, 2006 p. 86), potentially further improving 
convergence and reducing the likelihood of a nonpositive definite solution. 

Centering is not always optimal for interpretation as it necessarily shifts the intercept to 
the midpoint of the observation window, seldom a location of substantive interest (Raudenbush, 
2001). For instance, in the BPRS data, Time was meaningfully coded to be zero at the end of the 
observation window to facilitate evaluating individual and group differences at the end of the 
trial, so centering would affect the ability to address the original research question as a direct 
parameter in the model (the endpoint could be tested with any centering scheme through linear 
combinations of parameters, however). 

Orthogonal Polynomial Model 

Orthogonal polynomials originating from analysis of variance have been suggested as 
another method that rescales Time (Hedeker & Gibbons, 2006 p. 84). Applying this method 
requires some data preprocessing to transform the time values into orthogonal polynomial values 
(e.g., Hedeker & Gibbons, 2006 p. 88), but it has the benefit of putting the polynomial terms for 
Time variables onto the same scale and removing the correlation between them. Specifically, if T 


is the matrix of time values (e.g., intercept, Time, and Time? for a quadratic growth model), the 
orthogonal polynomial value matrix is calculated by T(s" iy where S is the Cholesky factor of 
the symmetric matrix, T'T . A benefit of orthogonal polynomials is that nonpositive definiteness 


related to random effect correlations greater than 1 can be alleviated, although lower boundary 


issues pertaining to random effect variances near O can persist. Nonetheless, similar to centering, 
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interpretation can be difficult. Consequently, Biesanz, Deeb-Sossa, Papadakis, Bollen, and 
Curran (2004) suggest that orthogonal polynomial models are best understood graphically but 
ultimately caution against their use (although the coefficients can be translated to correspond to 
the original time metric, Hedeker & Gibbons, 2006, p. 91). 

Cholesky Covariance Structure 

Whereas centering and orthogonal polynomials are potentially useful when trying to 
address non-positive definiteness in models that include random effects of higher order terms 
(e.g, Time’), these solutions are not as applicable for models that do not include higher-order 
polynomial terms. An alternative and fully general strategy that is sometimes suggested is to 
reparameterize the random effects covariance matrix via a Cholesky decomposition. 

As noted in the introduction, not every symmetric matric conforms to the properties of a 
covariance matrix. As a result, direct estimation of an unrestricted structure may result in a 
nonpositive definite and therefore inadmissible estimate of the random effects covariance matrix, 
unless constraints are applied to ensure positive semi-definiteness (Pourahmadi, 1999). 
Reparameterizing the covariance matrix in terms of a Cholesky decomposition, however, ensures 
at least semipositive definiteness without requiring constraints on the parameters. 


The Cholesky decomposition (L) is the matrix square root of G with r(r+l)/ Z 


parameters where G = LL’ .* Rather than estimating G directly, the goal is to estimate the 
elements of L and then multiply L by itself (transposed) to create G. The benefit is that none of 
the elements in L need to be constrained but LL" is guaranteed to be at least semipositive 


definite, similar to how the square of any real number (positive or negative) cannot be negative. 


4 There is also a modified Cholesky factorization that estimates the matrix square root of the correlation matrix and 
an r-dimensional diagonal matrix of variances ( ) such that G = LIL’ (Daniels & Zhao, 2003). 
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Structuring of the covariance matrix with a Cholesky decomposition can still produce a 
nonpositive definite matrix where one or more diagonal terms are exactly zero, however, so a 
Cholesky decomposition only guarantees semi-positive definiteness rather than positive 
definiteness. A Cholesky decomposition for the random effect covariance matrix is 
commonplace in software (e.g., Hedeker & Gibbons, 1996; Bates, Machler, Bolker, & Walker, 
2014) because it increases stability of estimation, is less susceptible to rounding problems, and is 
computationally expedient (Lindstrom & Bates, 1988, SAS Institute Inc., 2018a, p. 3730)._The 
idea of making variance scale magnitudes more congruent to ease numerical stability of 
estimation has been noted previously. Kohli et al. (2019) discuss its effectiveness (and often, its 
necessity) in piecewise models with random knots. Additionally, McNeish, Dumas, and Grimm 
(2019) describe reparameterizing nonlinear models to reduce scale magnitude differences 
between linear parameters and those that appear in exponents. 

Although the Cholesky decomposition is one general solution to the problem of 
nonpositive definiteness, it may still encounter estimation difficulties and fail to produce a 
converged solution. A different covariance structure approach — namely, the factor analytic 
covariance structure — shares some of the desirable properties of a Cholesky decomposition, but 
also has some unique properties of its own. However, the method of using a factor analytic 
covariance structure has not been widely recognized among psychologists and, as we argue, 
deserves greater consideration. The next section covers details of the factor analytic covariance 
structure and how it is suited to assistance with convergence issues like those observed in the 
BPRS data. 


The Factor Analytic Covariance Structure 
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Factor analysis is well-known in psychology as a method to test hypotheses involving 
latent variables (Brown, 2015; Mulaik, 2009). Though this is a common substantive interest, 
factor analysis at its core is a mathematical decomposition of the covariances between variables 
(Meyer, 2009). That is, when psychologists test global fit of factor analysis models with a 
likelihood ratio test, the null hypothesis posits that the covariance matrix implied by the model is 
equal to the observed covariance matrix of the variables. Factor analysis ultimately boils down to 
reproducing a covariance matrix based on hypothesized common factors that explain relations 
between observed variables. This property of factor analysis makes it potentially useful as 
computational tool for decomposing a random effect covariance matrix, even if there is not 
necessarily theoretical interest in common factors underlying the random effects (Jennrich & 
Schluchter, 1986; Wolfinger, 1996). 

To clarify, the typical unrestricted structure as featured in Equation 6 directly estimates 
the standard deviation and correlations (or the variances and covariances) between random 
effects. As a conceptual path diagram, this can be represented by the top panel of Figure 3. Each 
variance or covariance is an explicit parameter, and these parameters are estimated directly. This 


results in r(r + 1)/ 2 unique parameters being estimated for r random effects. On the other hand, 


the factor analytic approach decomposes the covariance between the random effects with, in this 
case, a one-factor model, whose path diagram is shown in the bottom panel of Figure 3. The 
factor analytic structure features no direct estimates of random effect variances or covariances. 
Instead, it estimates loadings (A) and uniquenesses (d) to decompose the covariances between 
the random effects. These loadings and uniquenesses are then combined to yield each element of 
the random effect covariance matrix such that G= AA’ +D where Ais a rx1 vector of factor 


loadings and D is a rxr diagonal matrix of uniquenesses (readers may recognize this as the 
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model-implied covariance from factor analysis when the factor covariance is an identity matrix). 
Note that the uniquenesses in D are not the variance of the residuals from the growth model 


discussed earlier in R, (which reside at Level 1) but instead are parameters associated with the 


variability in the random effects that are not accounted for by the loadings in A (which reside at 
Level 2). The factor analytic structure requires 2r parameters (r loadings and r uniquenesses) to 
fill the complete random effect covariance matrix.” 
[Figure 3 about here] 
To make this explicit, consider the quadratic polynomial growth model from the BPRS 
data where r = 3. The factor analytic approach would estimate 3 loadings and 3 uniquenesses 


which would be combined in the following way to populate the random effect covariance matrix 


Ag + dy 
G=AAT+D=| 44, A? +4, (8) 
Ayhy AA, As +d, 


Each variance is equal to the loading for that random effect squared plus the uniqueness for that 
random effect. Each covariance is equal to the product of the loadings for the two random effects 
that are involved. In this way, each element still takes on a unique numerical value as in the 
unrestricted structure; however, these elements are not directly estimated but rather are formed 
from estimates of a factor analytic decomposition of the random effects. Just like how we can 
parameterize the covariance as a function of standard deviations and correlations, the factor 
analytic structure parametrizes the matrix as a function of loadings and uniqueness. The end 


result for each element is still a variance or a covariance, but the parameters estimated and 


> The dimensions listed here represent a factor analytic structure with one factor. As we discuss later, it is possible to 
use multiple factors, in which case the dimension of A would be rxq and the number of parameters would be 


rq+r for g the number of factors. 
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combined to arrive at that variance or covariance are structured differently. The process of 
decomposing the random effect covariance matrix with a factor model would seem to make the 
estimation process more complex since it is a roundabout way of arriving at the same 
information that is directly estimated by an unrestricted structure but it has three benefits that can 
help to avoid nonpositive definiteness. 

First, the number of parameters required for a one-factor structure (27) scales more 


slowly than the number of parameters needed for an unrestricted structure or a Cholesky 


structure (r(r + 1) /2). With 3 random effects, the minimum for which the factor analytic 


structure would typically be identified, the number of parameters is 6 and is the same for a factor 
analytic structure and an unrestricted model.® With more than 3 random effects, however, the 
one-factor structure has fewer parameters, reducing the dimensionality of the matrix while 
continuing to model variances and covariances of all random effects (Vermunt, 2007, p. 142). 


For instance, a model with 5 random effects would require (5 x 6) /2=15 covariance parameters 


with an unrestricted structure but only 2x5 =10covariance parameters with a factor analytic 
structure. If the data cannot support as many covarying dimensions as desired with an 
unrestricted structure, a factor analytic structure can reduce the number of dimensions without 
requiring that any random effects necessarily be removed from the model. 

Second, “0” estimates with the factor analytic structure do not necessarily result in 
nonpositive definiteness as they can with an unrestricted structure (Meyer, 2008). For instance, if 


the estimate of d, in Equation 8 were 0, this would not automatically imply a nonpositive 


definite covariance matrix. With an unrestricted structure where variances or standard deviations 


® With 2 or fewer random effects the one-factor factor analytic structure would be under-identified, having more 
parameters than the unrestricted structure, and this over-parameterization would typically be signaled by a non- 
positive definite Hessian matrix. 
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are directly estimated, a 0 on the main diagonal results in nonpositive definiteness. With a factor 
analytic structure, a 0 estimate for d, would indicate that squaring J, can alone reproduce the 


variance for the random effect. D would no longer be full rank, but that is not prohibitive for 
convergence because the ultimate goal is that G is positive definite. In other words, A and D are 
simply a method to populate G so there is more flexibility in what constitutes an in-bounds 
estimate compared to an unrestricted structure. This property can be useful with smaller samples 
or many random effects where uncertainty is higher because imprecision that can result in 
nonpositive definiteness with an unrestricted structure does not necessarily have the same 
adverse effect for a factor analytic structure (Meyer & Kirkpatrick, 2005). 

Third, and similar to a Cholesky structure, the factor model operates on the square root 
matrix of the covariance matrix (Oman, 1991), which has benefits related to scale magnitude 
differences between parameters (Hedeker & Mermelstein, 1998). That is, parameters whose 
variables are on scales with different magnitude makes estimation of unrestricted covariance 
matrices difficult (Kiernan, 2018) because of the way that parameters are typically updated 
across iterations using the gradient vector and the Hessian matrix of the likelihood. Similar to 
centering or orthogonal polynomial approaches, working on the square root matrix makes 
parameters more stable to estimate (Cheng et al., 2010; Gibbons & Bock, 1987). 

In the next section, we reanalyze the motivating data with a factor analytic approach to 
show how it can reduce convergence issues. Afterward, we carry out a simulation study to 
quantify its potential benefits to expel nonpositive definite random effect covariance matrices. 


Reanalysis of Motivating Example 
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We return to the motivating BPRS data we discussed earlier. We fit the same model as in 
Equation 5 but this time using a factor analytic structure for the random effect covariance matrix 


such that 


0] | A; +d, 
u,~MVN|/0|,| 44, A +d, (9) 
O] | AA, AA A td, 
The residual structure is again heteroskedastic as a function of time as in Equation 7. We fit the 
model in SAS Proc Mixed using restricted maximum likelihood estimation with Kenward- 
Rogers degrees of freedom with the SAS default convergence options; in SAS, a one-factor 
structure can be requested in the RANDOM statement by specifying TYPE = FA(1). 

Whereas the model previously had issues with nonpositive definiteness using the 
unrestricted structure in Equation 6, with a factor analytic covariance structure, the model 
converged without issue. The average fitted trajectories are identical to those shown in Figure 2 
and estimates for the model with a factor analytic covariance structure are shown in Table 2. The 
factor analytic structure does not directly estimate the individual elements of the covariance 
structure, so some calculation is needed to transform the estimates in Table 2 into the random 


effect covariance matrix: 


Ap +dy 6.95° +83.14 131.44 
G=| 4A, 4 +d, =| 6.95x2.16 2.167+0 =| 15.01 4.67 
Ayhn AA, AZ td, 6.95x0.21  2.16x0.21 0.21° +0.03 1.46 0.45 0.07 


1 
For ease of reference, the corresponding correlation matrix is |.61 1 . These estimates 
48 .79 1 


reveal some meaningful information that is omitted when using an unrestricted structure, such 


being able to quantify the quadratic slope variance and the correlations between the random 
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effects. Not being forced to constrain particular elements in order to achieve convergence also 
leads to a significant improvement in the restricted loglikelihood relative to the simplified model 
in Table 1, y7(3) =13.0, p <.01 and an improvement in the BIC (2452.7 for the factor analytic 
structure vs. 2460.0 for the two random effect unrestricted structure). 

[Table 2 about here] 

Being able to model all possible elements in the random effect covariance matrix impacts 
the fixed effect standard errors and consequently, the p-values associated with the fixed effects. 
Despite the fact that the fixed effect point estimates are extremely close between covariance 
structures in Table 2, the standard errors and p-values differ. As a result, many of the conclusions 
related to the second research question presented earlier no longer hold. For instance, the 
instantaneous slope at Week 8 in Treatment Group 1 is now clearly not statistically significant ( 
y =-1.17,t(36) =—1.75, p =.09 ), making it clearer that growth has leveled off by Week 8 and 
that this group is no longer improving at the study’s conclusion. The p-value for the 
instantaneous slope of Treatment Group 2 at Week 8 now slightly exceeds .05 
(yv =1.35, (36) = 2.02, p =.051), dampening possible evidence that this group is getting worse at 
the conclusion of the study. The difference in the quadratic slopes is also no longer statistically 
significant (v =—0.22,t(40.1) =1.72, p =.09 ), indicating that there is no difference in the 
curvature of the growth trajectories between treatments. 

Essentially, all of the previously significant effects related to the second research 
questions are no longer significant when all elements of the random effect covariance matrix are 
retained with the factor analytic structure. Of course, this is a single empirical example and it is 


impossible to determine which set of results is actually more accurate. To test this more 
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rigorously, we perform a simulation in the next section to compare the performance of the 
unrestricted and factor analytic structures for the random effect covariance matrix. 
Simulation 

Simulation Design 

To determine if the findings from the BPRS example hold more generally and to compare 
performance of different approaches to addressing nonpositive definiteness, we performed a 
Monte Carlo simulation study. There is an extensive literature surrounding Cholesky 
decompositions of covariance matrices to make computation more expedient or to allow for 
unconstrained estimation (Pinheiro & Bates, 1996; Elzo, 1996) in addition to related 
decompositions for specifying prior distributions in Bayesian analyses (Barnard, McCulloch, & 
Meng, 2000; Liu, Zhang, & Grimm, 2016). These issues were salient in years past when 
computation time for random effects models was more problematic, but there has been very little 
exploration related to the practical question of whether employing different approaches 
ultimately leads to improved convergence and performance compared to directly estimating the 
elements of an unrestricted covariance matrix (for exceptions that include practical comparisons 
as secondary interests, see van der Elst et al., 2016 and Lu & Mehrotra, 2010). 

We generated data from a two-group quadratic growth model whose population values 
were taken from rounding the BPRS estimates presented previously. We simplified the residual 
covariance matrix to be a homogeneous diagonal rather than heteroskedastic. The statistical 


model used to generate data is shown in Equations 10 and 11. 


VY, = fy + B,Lime, + B,,Time; + e, 
fy; =30+ 5.0(Treat, ) + Up; 

fB, =-1.0+ 2.35(Treat, ) us 

f,, =0.20+ 0.20(Treat, ) Hills: 


(10) 
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u,~ MVN|/0},, 15 5 , e,~N(0,15) (11) 
0] [1.5 0.50 0.07 


Measurement occasions were generated to be time-structured such that each person had the same 
number and timing of measurements. Diallo et al. (2014) noted that the number of repeated 
measures was closely related to convergence in quadratic growth models, so we included three 
repeated measure conditions: 4, 6, and 8. We included two variations of the 4 repeated measure 
conditions with different timing. In the “narrow” 4 repeated measure condition, timing 
corresponded to the final 4 time-points of the 8 repeated measure condition (i.e., time was coded 
-3, -2, -1, and 0). In the “wide” 4 repeated measure condition, the window of observation was 
equal to the 8 repeated measure condition but with half as many measurement occasion (i.e., time 
was coded -6, -4, -2, 0). We also included three different sample sizes: 50, 100, and 200. We 
focus on the smaller end of the sample size spectrum because this is where issues associated with 
nonpositive definiteness are most common. For each combination of sample size and repeated 
measures, we created 1000 datasets from the data generation model using SAS Proc IML. To 
each generated dataset, we fit the model in Equation 10 in SAS Proc Mixed using restricted 
maximum likelihood estimation and the default convergence and boundary constraint options. 
SAS files used to conduct the simulation are provided on the first author’s Open Science 
Framework page (https://osf.io/xhfqw). 
To each generated dataset, we fit models with 4 different parameterizations: 
1. An unrestricted random effect covariance structure with a standard deviation- 


correlation parameterization 
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2. Re-expressing Time and Time? via orthogonal polynomials and using an 
unrestricted random effect covariance structure with a standard deviation- 
correlation parameterization 

3. An unrestricted random effect covariance structure parameterized by a Cholesky 
decomposition 

4. A factor analytic random effect covariance structure based on one-factor 

As in the analysis of the example data, for replications that did not converge using the 
unrestricted structure with a standard deviation-correlation parameterization, we fit a model that 
removed the random effect for the quadratic slope so that the random effect covariance matrix 
was 2x2 and only included the intercept standard deviation, the linear slope standard deviation, 
and the correlation between the two to explore the statistical properties of removing random 
effects when it may not be warranted. Refitting the model with fewer random effects was only 
done for the first parameterization in the above list. 

Outcomes 

The first outcome of interest was the convergence of the fitted models. Nonpositive 
definiteness of the estimated random effect covariance matrix was the main interest and the main 
source of convergence issues, but we also counted other convergence issues such as a 
nonpositive definite Hessian matrix or an infinite likelihood. 

Second, we looked at the relative bias of the random effect standard deviations and 
random effect correlations in replications that were able to converge for each type of covariance 
structure. This outcome was of interest because the ability to achieve convergence is necessary 
but not sufficient for recommending use of a particular method. A method that converges but 


yields poor estimates is not necessarily an improvement, so this outcome will help address 
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whether the alternative covariance structures increase, decrease, or has no effect on the quality of 
the covariance structure estimates. The random effect covariance population values vary widely 
in their magnitude, which can affect the comparability of relative bias across parameters. For this 
reason, we report the relative bias of the random effect correlations instead in attempt to 
somewhat standardize the relative bias values across the different off-diagonal elements. A 
similar rationale is behind the choice to present relative bias of standard deviations. 
Third, we will report the relative bias of the fixed effect estimates and the coverage of the 

95% confidence interval for each of the fixed effects. Relative bias of the fixed effect estimates 
is not expected to be affected by the specification of the random effect covariance matrix with 
continuous outcomes (Liang & Zeger, 1986) but we will inspect this metric as a precaution. 
However, the standard errors of the fixed effects may be adversely affected. To explore this 
possibility, we inspect the proportion of replications in which the population value falls within 
the estimated 95% confidence interval for each fixed effect parameter. If there is minimal 
relative bias of the fixed effects, this measure becomes an assessment of the standard errors. If 
the standard errors are appropriately estimated, the confidence interval coverage rate should be 
near 95% with values between 93% and 97% typically being considered reasonably close to 95% 
(Bradley, 1978). If the standard errors are too small, the coverage rate will be below 93%; if the 
standard errors are too large, the coverage rate will exceed 97%. To maintain consistency with 
the approach commonly taken in empirical studies, we will report these metrics for 

1. The converged replications of the unrestricted random effect covariance structure using a 

standard deviation-correlation parameterization 
2. The model with only two covarying random effects for the replications where the 


unrestricted matrix did not converge 
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3. The converged replications of the factor analytic structure 
4. The converged replications of the Cholesky decomposition 
Results 

Convergence Rates. Table 3 shows the percentage of replications that converged across 
each of the simulation conditions. As noted by Diallo et al. (2014), convergence was highly 
influenced by the number of repeated measures in the data with a higher number of repeated 
measures being associated with higher convergence rates. Sample size was also related to 
convergence rates such that convergence generally improved with larger samples, though this 
effect was weaker than the effect of the number of repeated measures. 

[Table 3 about here] 

Beyond these general trends that have been noted in previous studies, the salient finding 
in Table 3 is the difference in convergence between using an unrestricted structure and 
alternative parameterizations of Time or the covariance structure. With 8 repeated measures, the 
factor analytic structure converged in nearly every replication, with orthogonal polynomials and 
the Cholesky structure not far behind (mid 90s to 100 percent convergence). On the other hand, 
the unrestricted structure converged only about two-thirds of the time with 8 repeated measures. 
With 6 repeated measures, the convergence rate of the factor analytic structure was not perfect 
but remained noticeably higher (high 70s to mid 90s, depending on sample size) compared to 
either the orthogonal polynomial or Cholesky structure (high 50s to low 80s) or the unrestricted 
structure (mid 20s to high 30s). With 4 narrowly spaced repeated measures, the convergence of 
the factor analytic structure was not great (mid 40s to high 50s) but nonetheless was a moderate 


improvement over orthogonal polynomials or a Cholesky structure (low to mid 40s) and all three 
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were a vast improvement over a unrestricted structure in the same conditions (whose 
convergence did not exceed 20%). 

As anticipated by the previously reviewed biostatistics literature, a factor analytic 
structure clearly demonstrates fewer obstacles to convergence compared to either unrestricted 
structure or an orthogonal polynomial model across conditions, especially with moderate sample 
size and a moderate number of repeated measures. Though the ability to increase convergence 1s 
useful, the utility of such an increase is only relevant if the quality of the estimates is also 
satisfactory. The next subsection reports on the bias of the covariance parameters across 
simulation conditions to explore whether the improvement in convergence is concomitant with 
quality parameter estimates. 

Covariance Parameter Estimate Relative Bias. In this section, we look at the ability of 
the unrestricted, Cholesky structure, and factor analytic structures to recover the population 
values for the random effect covariance matrix. We do not report results for orthogonal 
polynomials, as the estimates are on a different scale than other methods due to the different 
coding of time. To keep results succinct, we only present the narrow 4 repeated measure 
condition where the greatest differences were observed. 

Table 4 shows the relative bias of the random effect standard deviations and random 
effect correlations across simulation conditions. The relative bias values in Table 4 for each 
covariance structure are only calculated for replications that converged, so the number of 
replications is not constant across columns. For instance, the UNR column for NV = 50 and RM = 
4 is based on 140 replications (14% of the 1000 possible replications that converged as reported 
in Table 3) whereas the FA column for N =50 and RM = 4 is based on 440 replications (44% of 


the 1000 possible replications that converged as reported in Table 3). 


REDUCING NONPOSTIVE DEFINITENESS 31 


[Table 4 about here] 

A salient finding in Table 4 is that the relative bias for the random effect standard 
deviations with a factor analytic structure were generally equal to or lower than those from an 
unrestricted structure or Cholesky structures. This is notable because the factor analytic structure 
was able to converge more often than an unrestricted or a Cholesky structure and provided 
estimates that were closer to the population standard deviation values, on average. No detriment 
was observed in the random effect standard deviations even when factoring in the additional 
replications that were unable to converge with the unrestricted structure. 

The relative bias of the random effect correlations does not strongly support one method 
being superior to the other because the best method is split across conditions. The relative bias of 
the correlation between random intercepts and random quadratic slopes was typically worse for 
the factor analytic structure. However, this finding appeared to be related to convergence. The 
scale magnitudes of the intercept and the quadratic slope were the most discrepant, meaning that 
this parameter was the most difficult to estimate. With the unrestricted structure, difficulty with 
this parameter may simply have triggered a nonconvergent replication. Conversely, the factor 
analytic structure would have been more likely to provide an estimate for this parameter, but the 
difficulty may have manifested in more biased estimates rather than nonconvergence. When 
considering only replications that converged, the factor analytic structure was always better than 
the unrestricted structure for estimating random effect correlations with 4 repeated measures, 
about the same as the unrestricted structure with 6 repeated measures, and closer to (but still 
slightly worse than) the unrestricted structure with 8 repeated measures. 

So far, the factor analytic structure has shown that it improves convergence while 


moderately increasing the quality of the covariance parameter estimates compared to the 
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unrestricted structure. The next subsection inspects the confidence interval coverage rates when 
employing each of the different covariance structures. 

Fixed Effect Relative Bias and Confidence Interval Coverage Rates. As anticipated, 
the relative bias of the fixed effect estimates was negligible across all conditions and was not 
different across the different covariance structure conditions. We do not report this information 
in the interest of brevity but do note that this signifies that any issues with confidence interval 
coverage rates is solely attributable to inaccuracies in the standard errors. 

Table 5 shows the 95% confidence interval coverage rates for each fixed effect across the 
simulation conditions. For succinctness and comparability, we reported on the same conditions 
as Table 4. Reported values are the coverage of the 95% confidence interval, so coverage is 
expected to be near 5% if standard errors are accurately estimated. Though none of the fixed 
effects in the present data generation model are null, these coverage rates provide a good 
indication of what we would expect the Type-I error rate to be for a null effect under these 
conditions. If, for a null effect, (1 — Coverage Rate) > 0.05, then the Type I error rate is elevated 
above the nominal level. Thus, coverage rates under 95% for non-null effects suggest a 
corresponding elevation in Type-I errors for null effects without having to explicitly include null 
effect in the data generation model. The unrestricted structure (the UNR column), Cholesky 
structure (the CD column), and the factor analytic structure (the FA column) display the 
coverage rates for the replications that converged. The version of the model that omits the 
quadratic slope random effect (the UNR2 column) includes replications for which the full 
unrestricted structure matrix did not converge. This was done to mimic the common protocol of 
simplifying the random effect covariance matrix when the full unrestricted model fails to 


converge. 
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[Table 5 about here] 

From Table 5, the unrestricted structure yields reasonable coverage rates across 
conditions, as would be expected since this model directly corresponds to the data generation 
model. However, as noted in Table 3, this model can have convergence issues and may not 
reliably converge to a solution in some conditions. In such cases where the random effect 
covariance needs to be simplified to achieve convergence, the confidence intervals were too 
narrow for some parameters (coverage rates below 93%), particularly when there are more 
repeated measures or the sample size is larger. This suggests that the Type-I error rate for null 
effects would also be elevated. On the other hand, the factor analytic and Cholesky structures 
consistently produced fixed effect coverage rates between 94 and 97%, suggesting that Type-I 
error rates would also be well controlled. 

Summary. Alternative parameterizations such as orthogonal polynomials, Cholesky 
decompositions, or a factor analytic structure are not a cure-all solution to all estimation issues 
related to nonpositive definiteness. However, they vastly boost convergence, improve estimation 
accuracy for the random effect variances, do not adversely impact estimation of the random 
effect correlations, and generate accurate confidence intervals for fixed effects across sample 
size and repeated measure conditions. The factor analytic and Cholesky structures do not alter 
the ability to get unique estimates for all elements of the random effect covariance matrix — they 
merely change how those estimates are obtained and, in doing so, improve many aspects of 
estimation quality. Therefore, if nonconvergence issues related to nonpositive definiteness are 
encountered, these reparameterizations should be considered prior to removing random effects or 


constraining random effect correlations to 0. 


REDUCING NONPOSTIVE DEFINITENESS 34 


Extension to Modeling Cross-Sectionally Clustered Data 

Though our focus has been on growth models because issues with estimating the variance 
of quadratic slopes is widely noted, the convergence issues we discuss similarly extend to cross- 
sectional data where people are nested within higher-level units. Consider the seminal example 
provided in Raudenbush and Bryk (2002). The example features a subset of the 1982 High 
School and Beyond data set where the outcome variable is Mathematics Achievement for 6,774 
students within 160 different schools. We use a slightly different subsample than used in 
Raudenbush and Bryk (2002) to highlight issues central to this paper’; for the 97 schools with 
fewer than 50 students, we used all students as used in Raudenbush and Bryk (2002). For the 63 
schools with more than 50 students, we randomly sampled 50 students. This did not affect the 
unbalanced structure of the data — each school had a potentially different number of students and 
the median number of students per school (which was 47) was unaffected. The data include three 
within-school predictors (Sex, Minority Status, Socioeconomic Status [SES]) and two between- 
school predictors (Private School Status and School Size Divided by 100). 

We will follow a top-down model building strategy and fit a model featuring all 3 
student-level predictors with School Size forming a cross-level interaction with Female and 
Private School Status forming a cross-level interaction with both Minority Status and SES. All 
within-school predictors were group-mean centered and the school mean of each within-school 
predictor was also included to decompose effects into orthogonal within- and between-school 


components. The school means of the binary predictors were also grand-mean centered to 


7 The exact subsample from Raudenbush and Bryk (2002) converges using group-mean center for the model of 
interest, but not using grand-mean centering. Grand-mean centering can produce conflated estimates when random 
slopes are present (Rights et al., 2020), so we use a slightly different subsample to demonstrate the issue with the 
unconflated group-mean centering approach. 
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preserve the interpretation of the intercept. Further, we included random intercepts and random 
slopes for all 3 within-school predictors. In other words, the model would be 
MathAch, = By; + f,; (Female, — Female ; ) +B, (Minority, — Minority, ) + B, (SES, — SES, ) +e, 


Boj =Voo t YoiPrivate , + Yo (Size, = Size) + 


Vis (Female; = Female) sane (Minority, = Minority) + YosSES j +Ug; 


By; =VKio tha (Size, -Size)+u,, 
Po; = Vo + ¥Private ; + Uy, 


Bs; =Vyt 73,Private, + Us; 

(12) 
Where i indexes students and j indexes schools. We fit the model in SAS Proc Mixed with 
restricted maximum likelihood and between-within degrees of freedom because the sample size 
is large. We also calculated contextual effects for Female, Minority Status, and SES by 


subtracting the within-school coefficient from the between-school coefficient (e.g., 7%); —%. for 


Female). 
We initially used a 4x4 unrestricted structure for the random effect covariance matrix 


parameterized with standard deviations and correlations, 


By. 

P1900 hg 
P9229 P2191 0,” 
P53, P3193) P3233” 


u, ~ MVN (13) 


Sr 2 Or. 


However, this model was unable to converge due to a nonpositive definite covariance matrix 
attributable to the random effect correlation between Female and SES being inadmissible. 
Refitting the model after removing the Female random effect also resulted in a nonpositive 
definite random effect covariance matrix. Either removing the SES random effect (BIC = 


43,623) or both the Female and SES random effects (BIC = 43,610) were able to converge. 
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When removing both random effects as suggested by the BIC, the contextual effect for Minority 


Status (74 — 7%) = 1.16,1(6608) = 1.91, p =.057 ) is not significant. However, if only the SES 
random effect is removed the Minority contextual effect (74 — 7%) = 1.28,t(6608) = 2.09, p =.036 


) is significant. This highlights the increased arbitrariness of the simplification approach with 
cross-sectionally clustered data — it is not obvious which random slopes may be more or less 
imperative to retain in the model and inference for borderline effects can change depending on 
the choice. Further, these choices are not made based on theory but rather upon what will or will 
not converge. An additional caveat is that information criteria like BIC are a common guide but 
have been shown to be untrustworthy for accurately selecting among models with different 
covariance structures and BIC often prefers overly simple structures (Ferron et al., 2002; Gomez, 
Schaalje, & Fellingham, 2005; Keselman et al., 1999). Estimates and standard errors from these 
two models are provided in the left and center columns of Table 6. 
[Table 6 about here] 

The orthogonal polynomial approach was inapplicable to this example, owing to the lack 
of higher-order polynomial terms, leaving a Cholesky or factor analytic structure as possible 
methods to include random effects on all four within-school predictors. A Cholesky structure 


would specify that 


u, ~ MVN (0,LL") (14) 
where 
lL, 0 0 O 
pa{fo 4 9 0 ce 


and a factor analytic structure for the random effect covariance matrix would specify that 
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Ay Ap +d, 
Aho Ayhy Ay + d, 


Ayhy — Aghy gd Ag +s 


0 

0 
u,~MVN|| ||, (16) 

0 
Again, both the Cholesky structure in Equation 15 and the factor analytic structure in Equation 
16 are able to provide the same information as the unrestricted structure shown in Equation 13, 
in the sense that they calculate variances and covariances for all random effects. Given that the 
model has more than 3 random effects, the factor analytic structure also has fewer parameters (8) 
compared to the unrestricted or Cholesky structure (both have 10). 

The Cholesky structure was semidefinite positive but ultimately inadmissible because the 


estimate for /,, was exactly 0, resulting in an estimated SES slope variance of 0. However, the 


factor analytic structure converged and admissible estimates were obtained for all 8 covariance 
parameters with no issues. The results of the factor analytic structure are compared to the each of 
the simplified unrestricted structures in Table 6. We converted the factor analytic parameter 
estimates to variances and correlations to facilitate comparisons across the models. The results in 
Table 6 show that the contextual effect for Minority Status is significant ( 


Yor ~ Yop =1.32,1(6608) = 2.16, p =.031). 


This demonstrates some of the utility of the factor analytic structure, especially with 
many random effects. The traditional Cholesky approach was semipositive definite but had an 
inadmissible slope variance estimate for SES coefficient and the simplification approach could 
arrive at different conclusions depending on how the model were simplified without a reliable 
sense of which model should be preferred. Worse yet, there is rarely a guiding theory for 


importance of the different random slopes in cross-sectional data (unlike longitudinal data where 
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there is a clear hierarchy for which random effects to remove first). The factor analytic structure 
was able to estimate the between-school variance for all relevant predictors without issue. 
Discussion 

Maximal random effect structures are valuable theoretically but difficult to estimate 
successfully in practice, particularly when the number of observations for each independent 
sampling unit is limited (e.g., few repeated measures per person). As we argue in this paper, 
minor changes in how researchers parameterize a maximal random effect structure can have 
substantial impact on the ability of the estimation routine to converge to a solution. Factor 
analytic, Cholesky, and unrestricted structures all can be used to estimate a maximal random 
effect structure; however, our empirical examples and simulation showed that the factor analytic 
structure facilitates convergence while also improving the quality of the estimates whereas the 
unrestricted structure tends to struggle to overcome nonpositive definiteness. The theoretical 
underpinnings guiding selection of the factor analytic structure are that (1) it reduces some of the 
scale magnitude differences between variance-covariance parameters, (2) the loading and 
uniqueness parameters are less exposed to boundaries, and (3) it reduces the dimensionality of 
the covariance matrix when there are more than 3 random effects. We recommend that 
researchers consider employing a factor analytic structure for the random effect covariance 
matrix prior to attempting to simplify the covariance matrix if convergence may be an issue. 
Retaining superfluous random effects rarely causes harm (Verbeke & Molenberghs, 2000, p. 
127) but including too few random effects can result in biased estimates for the remaining 
random effect variance-covariance parameters, in turn biasing fixed effect standard errors and 


leading to potentially faulty conclusions (Agresti et al., 2004; Laird & Ware, 1982). 
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One somewhat unexpected finding from the simulation was the discrepancy between 
Cholesky and factor analytic structures. Cholesky approaches are commonly employed in 
software and applications and enjoy a wider literature while factor analytic structures remain less 
celebrated. The benefits of the Cholesky structure were evident in the simulation; however, the 
factor analytic structure had a non-trivial improvement in convergence and also reduced the bias 
of random effect standard deviations in nearly all the conditions included in our simulation. This 
suggests that there may be new avenues for research to explore if this finding is supported more 
generally and if the factor analytic structure may be researcher’s best chance to model a random 
effect covariance matrix when there are many random effects present. 

In terms of software implementation, the factor analytic structure is practical for users of 
some software programs as it is featured as a preprogrammed covariance structure option in both 
SAS Proc Mixed or Proc Glimmix [TYPE = FA(1)] and in SPSS MIXED (Covariance Type = 
Factor Analytic: First Order, Heterogeneous). The option is not provided in the standard Ime4 
package for mixed effect modeling in R and is not an option in Stata’s built-in mixed effect 
model commands. Although the factor analytic structure does not directly appear in the Ime4 R 
package for mixed effect models, the nlme package provides the option to use different versions 
of Cholesky decompositions, with a log-Cholesky parameterization being the default. Mplus 
does not provide preprogrammed covariance structures, but the factor analytic structure could be 
specified rather painlessly by embedding a constraints consistent with a factor analytic 
decomposition (similar to the bottom panel of Figure 3) viaa MODEL CONSTRAINT 
statement. 

We discussed the version of the factor analytic structure that decomposes the random 


effect covariance matrix using a single common factor. However, researchers are not limited to 
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only a single factor and can specify more factors if desired or they think that the complexity of 
the covariance structure cannot be accommodated by a single factor. In SAS, the TYPE = FA (q) 
statement that controls the covariance structure requests that g factors be used to decompose the 
random effect covariance matrix. We specified g = 1 in our examples, but larger values can be 
selected where g <r. Underspecifying g could lead to bias, but this bias is likely smaller than the 
bias that would be incurred by omitting random effects entirely and changing the dimension of 
the covariance matrix. If specifying a value for qg is a concern, one could use model comparisons 
to see if more factors are needed. 

Additionally, researchers can omit the uniquenesses from the factor analytic structure and 
rely solely on the loadings to parameterize the covariance matrix. Omitting the uniquenesses is 
typically referred to as an FAO structure, a benefit of which is that the result is ensured to be 
semipositive definite because the square of a loading cannot be below 0 (SAS Institute Inc., 
2018b, p. 6610). The FAO structure with the number of factors equal to the number of random 
effects (1.e., g = r) produces a Cholesky decomposition (SAS Institute Inc., 2018b, p. 6591). A 
potential downside of the FAO structure is that its reliance on only the loadings to pattern the 
covariance matrix is somewhat implausible for g < r, which might lead to bias. 

We focused on potential impacts to fixed effect standard errors, random effect covariance 
parameter estimates, and convergence. However, other aspects of mixed effect models are 
impacted by misspecification of the random effect covariance matrix. For instance, empirical 
Bayes predictions of random effects that are used to form person-specific trajectories include the 


nN 


estimated random effect covariance matrix in the calculation such that a, = GZ'V, '(y ‘ x p) : 


To the extent that the random effect covariance matrix is misspecified, G will be incorrect and 


could lead to inaccurate predictions for a,. Of course, if a random effect were removed to 
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appease convergence issues, then no empirical Bayes predications could be calculated for that 
effect. The extent to which alternative random effect covariance matrix parameterization could 
improve empirical Bayes predictions could be an additional avenue of future studies. 

Another future direction may be application to growth mixture models where 
convergence is a more prevalent and daunting problem (Bauer & Curran, 2003; Hipp & Bauer, 
2006). Infurna and Jayawickreme (2019) and Infurna and Luthar (2016) note that atheoretical 
simplification of the random effect covariance structure is rampant in empirical growth mixture 
modeling studies even though doing so adversely affects class enumeration and class assignment. 
McNeish and Harring (2020) have suggested marginal models as a way to improve convergence 
in growth mixture models, though a caveat of this approach is that researchers lose the ability to 
model person-specific curves. Applying a factor analytic covariance structure has been suggested 
for classification methods in general (McLachlan, 2011) and extending the idea to the random 
effect covariance matrix may present a possible way to improve convergence in growth mixture 
models while retaining the ability to obtain person-specific curves. 

Limitations 

First, the contexts for which mixed effect models are appropriate are vast and our study 
focused on a narrow segment of situations for which such models are appropriate. The results 
from our simulation were rather decisive in favoring the factor analytic structure over the 
unrestricted structure; however, this is not to say that these results would generalize broadly 
across contexts or even to other conditions within the contexts we studied. Aspects like 
individualized timing of measurement occasions, attrition, and varying numbers of repeated 
measures are common in longitudinal data but were not studied and could impact performance. 


Additionally, we provided a single empirical example to show that the same pattern of 
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convergence extends to cross-sectionally clustered data, but a full simulation would be required 
to quantify the degree to which a factor analytic structure may improve convergence and 
performance for data characteristics commonly found in cross-sectional data. This avenue could 
prove more beneficial as decisions about which random effects to remove in cross-sectionally 
clustered data are less straightforward than in longitudinal data and the ability to retain more 
theoretically interesting random effects could be useful for these models. 

Second, our simulation generated data such that the variances of the random effects were 
non-zero. In practice, models with a maximal random effect structure may yield nonpositive 
definite covariance matrices because one or more of the coefficients simply may not vary across 
people or predictors completely explain any variability that exists. Our results suggest that a 
factor analytic structure can improve stability in estimating covariance matrices that are positive 
definite in the population. However, when the population covariance matrix is nonpositive 
definite because some random effects truly have zero variance, changing the specification will 
not improve the probability that an estimation routine converges and researchers should not 
discount the possibility that some random effects simply might not have any variability. As a 
diagnostic, Bates et al. (2015) suggest fitting a principal components analysis using the random 
effect covariance as input to determine how many components can account 100% of the variance 
and to reduce the number of random effects accordingly. In R, the rePCA function in the Ime4 
package implements this method. 

Third, we focused on models with continuous outcomes but mixed effect models are also 
popular when outcomes are discrete and convergence presents a greater challenge in such 
contexts because estimation is far more difficult because the random effects do not integrate out 


of the likelihood (McNeish, 2016; Schoeneberger, 2016). Furthermore, because the random 
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effects cannot be integrated out of the likelihood, fixed effect estimates are susceptible to bias 
when there are misspecifications in the random effect structure (e.g., Bauer & Sterba, 2011; 
Blozis & Harring, 2018; Harring & Liu, 2016). A factor analytic structure can be similarly 
applied in mixed effects models for binary or count outcomes and is included as a 
preprogrammed option in mainstream software like SAS Proc Glimmix. Exploring whether a 
factor analytic structure can similarly improve convergence in models with discrete outcomes 
could serve as an interesting follow up to the current study. 

Fourth, we only considered restricted maximum likelihood estimation whereas full 
maximum likelihood estimation is another option that can be employed and is popular for models 
fit in structural equation modeling software. Restricted and full maximum likelihood are 
equivalent at larger sample sizes but will diverge at smaller sample sizes (McNeish, 2017). With 
smaller sample sizes, full maximum likelihood tends to provide downwardly biased estimates of 
variance components and fixed effect standard errors (Maas & Hox, 2005). From these 
properties, our speculation would be that nonpositive definiteness with full maximum likelihood 
would be encountered more often than with restricted maximum likelihood because downwardly 
biased variance component estimates would be more likely to be near 0. 

Concluding Remarks 

Retaining many random effects is theoretically desirable and allowing each of these 
random effects to covary is a recommended practice that increases the quality of estimated 
quantities. As many empirical researchers are aware, this is easier said than done and 
nonconvergence can quickly derail plans to use a maximal random effect structure. How 
researchers specify a maximal random effect structure can have pronounced consequences on the 


ability of the estimation routine to converge to a solution. The typical approach of using an 
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unrestricted structure is susceptible to scale magnitude differences and boundaries on the 
parameter space. Our simulations showed that this can lead to low convergence rates. Moreover, 
the typical remedy of simplifying the covariance structure can lead to poor confidence interval 
coverage rates for fixed effects (and an expected increase in the Type-I error rate for null 
effects). Instead, alternative parameterizations can help to mitigate some of these estimation 
issues. Ultimately, we found that a factor analytic structure best improves both convergence and 


performance. 
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Table 1 


Model estimates from BPRS data with simplified covariance matrix 


Effect Notation Estimate SE P 
Fixed Effects 
Week 8 Intercept, Group 1 Yoo 28.30 2.53 <.01 
Week 8 Intercept, Group2. /y) + % 33.97 “2:53.50 
AWeek 8 Intercept You 5.68 3.58  .12 
Slope at Week 8, Group 1 No -1.15 0.60  .05 
Slope at Week 8, Group 2 Yio tN 1.35 0.60  .03 
ASlope at Week 8 Nu 2.50 0.84 <.01 
Quadratic Slope, Group 1 129 0.20 0.07 <.01 
Quadratic Slope, Group 2 Yo) +04 0.41 0.07 <.01 
AQuadratic Slope Yn 0.21 0.10 .03 
Covariance Structure 
Intercept Variance Fo 118.91 
Linear Slope Variance Thy 2.06 
Intercept, Linear Correlation Pro 0.40 
Residual Variance, Week 8 o 11.80 
Heteroskedasticity Effect 7) —0.22 


Note: Treatment Group | was used as the reference group. The A parameters are calculated by 


51 


(Group 2 — Group 1). The Heteroskedasticity Effect is a log-linear model. Since Week 8 is coded 
as () so time is comprised of negative values, —0.22 means that that the residual variance at time f 


is e°” =1.25 times the residual variance at time t+1. For instance, the residual variance at 
Week 7 would be equal to the residual variance at Week 8 times 1.25, (11.80x1.25) =14.70. 
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Table 2 


Model estimates for BPRS data using factor analytic covariance structure 


Effect Notation Estimate 
Fixed Effects 
Week 8 Intercept, Group 1 Yo0 28.28 
Week 8 Intercept, Group 2 Yoo + Yor 33.98 
AWeek 8 Intercept Yor 5.70 
Slope at Week 8, Group 1 N10 -1.17 
Slope at Week 8, Group 2 Yio Vi 1.35 
ASlope at Week 8 Via 2.52 
Quadratic Slope, Group 1 Yn 0.19 
Quadratic Slope, Group 2 V2 + Ya 0.41 
AQuadratic Slope Yo 0.22 
Covariance Structure 

Intercept Loading Ay 6.95 
Linear Slope Loading A 2.16 
Quadratic Slope Loading A, 0.21 
Intercept Uniqueness dy 83.14 
Linear Slope Uniqueness d, 0.00 
Quadratic Slope Uniqueness as 0.03 
Residual Variance, Week 8 a” 10.74 
Heteroskedasticity Effect o —0.22 


SE 


2.59 
2.59 
3.67 
0.67 
0.67 
0.94 
0.09 
0.09 
0.13 


<.01 


Note: Treatment Group | was used as the reference group. The A parameters are calculated by 


32 


(Group 2 — Group 1). The Heteroskedasticity Effect is a log-linear model. Since Week 8 is coded 
as 0 so time is comprised of negative values, —0.22 means that that the residual variance at time f 


is e°** =1.25 times the residual variance at time f+1. For instance, the residual variance at 
Week 7 would be equal to the residual variance at Week 8 times 1.25, (10.74 1.25) = 13.43. 
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Table 3 


Percentage of converged replications by simulation condition 


N=50 
RM UNR OP CD 
4 (narrow) 14 44 39 
6 25 58 58 
4 (wide) 33 71 #71 
8 67 97 = 97 


FA 
44 
V7 
86 
99 


UNR 
16 
31 
33 
68 


N= 100 
OP CD 
44 45 
67 67 
82 82 
100 100 


FA 
53 
87 
96 
100 


UNR 
16 
37 
33 
61 


N = 200 
OP CD 
45 46 
82 82 
93 93 
100 100 
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FA 
59 
95 
99 
100 


Note: N = Sample Size, RM = Repeated Measures, UNR = Unrestricted Structure with Standard 
Deviation-Correlation Parameterization, OP = Orthogonal Polynomial with Unrestricted 


Structure, CD = Cholesky Decomposition Parameterization of Unrestricted Structure, FA = 


Factor Analytic Structure. 
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Table 4 


54 


Percent relative bias for each standard deviation or correlation parameter by simulation condition and random effect covariance 
matrix structure 


N RM _ Element 


50 


50 


50 


4 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


UNR CD 
0 0 
84 169 

324 1349 
-12 -27 
22-60 
8 0 
0 0 
24 ©=—-28 
50-90 
40 7 
-5§ -19 
5 2 
0 
5 
-1 -l 
4 0 
1 2 


14 


11 


100 


100 


Element 

Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


=e An na ff SO 


N 
200 


200 


200 


RM 
4 


Element 

Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


Int SD 
Linear SD 
Quadratic SD 
Int, LS Corr 
Int, QS Corr 
LS, QS Corr 


UNR CD FA 


0 
47 
191 


Fe WR NY NY O&O 


Note: N = Sample Size, RM = Repeated Measures, UNR = Unrestricted Structure with Standard Deviation-Correlation 
Parameterization, CD = Cholesky Decomposition, FA = Factor Analytic Structure, Int = Intercept, LS = Linear Slopes, QS = 
Quadratic Slope, SD = Standard Deviation, Corr = Correlation. Relative bias is only calculated for replications that converged for 


each method, so values across methods are not based on the same number of replications. 


1 
81 


oo or Fe Oo 


0 
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Table 5 


Fixed effect 95% confidence interval coverage rates by simulation conditions and random effect covariance matrix structure 


N RM Effect 

50 4 Time 
Time? 
Time’xTreat 
TimexTreat 


Treat 


50 6 Time 

Time? 
Time’xTreat 
TimexTreat 


Treat 


50 8 Time 

Time? 
Time’xTreat 
TimexTreat 


Treat 


UNR UNR2 
97 94 
98 93 
9795 
96 95 
94 94 
96 92 
97 94 
98 94 
97 «92 
94 94 
9% = 93 
9% ~©~=—-90 
9 ~©= 9 
9% ~©=—-90 
94 94 


CD FA 
91 94 
92 94 
97 =97 
97 96 
96 896 
95 94 
96 96 
97 96 
96 94 
95 95 
95 95 
96 96 
96 96 
96 95 
95 95 


N 
100 


100 


100 


RM _ Effect 
4 Time 


Time 
Time?xTreat 


TimexTreat 


Treat 


Time 
Time 
Time2xTreat 


TimexTreat 


Treat 


Time 
Time 
Time?xTreat 


TimexTreat 


Treat 


UNR UNR2 CD 


94 


95 
95 
96 


93 
94 


92 
92 
93 
92 
94 


88 
87 
87 
89 
85 


96 
96 
96 
97 
95 


95 
95 
96 


96 
96 


94 
95 
95 


95 
96 


FA 
96 
96 
95 
96 
95 


95 
95 
95 


95 
96 


94 
95 
95 


95 
96 


ap) 


N 
200 


200 


200 


RM_ Effect 
4 Time 


Time? 
Time?2xTreat 


TimexTreat 


Treat 


Time 
Time? 
Time?xTreat 


TimexTreat 


Treat 


Time 
Time? 
Time?xTreat 


TimexTreat 


Treat 


UNR UNR2 
95 94 
95 95 
93 95 
96 93 
97 95 
97 93 
95 92 
95 92 
96 91 
94 94 
96 89 
95 85 
96 90 
95 91 
95 96 


Note: N = Sample Size, RM = Repeated Measures, UNR = Unrestricted Structure with Standard Deviation-Correlation 

Parameterization, UNR2 = Unrestricted 2x2 Structure With No Random Effect for the Quadratic Slope, CD = Cholesky 
Decomposition, FA = Factor Structure. UNR, CD, and FA are based on replications that converged in those conditions. UN2 is based 
on replications in which UNR did not converge. Bolded entries indicate coverage rates that are outside of reasonable limits suggested 
by Bradley (1978) 


CD 
96 
97 
95 
96 
96 


96 
94 
94 
95 
95 


95 
94 
96 


96 
95 


FA 
94 
95 
95 
95 
96 


96 
94 
94 
94 
95 


95 
94 
95 


96 
95 
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Table 6 
Estimates for different models for the High School Beyond data 


Unrestricted: Unrestricted: pais 
No Female or SES No SES Anatyiie 
Random Effects Random Effect 
Parameter Estimate SE  p Estimate SE p Estimate SE p 
Fixed Effects 
‘Intercept = 2©=—<C~Sti<C«~CC'«CTIN'S:«@Hss cc (ass—“(iéi S20 cc (sst—i Tw Cc 
Female -1.07 17 <1 -1.08 19 <¢oO]1 -1.08 19 <o] 
Minority —3.82 34 <1 —3.83 34 <1] —3.84 34 <0] 
SES 2.26 15 <Q] 2.26 15 <Q] 2.26 15 <0] 
School Mean Female —2.06 50 <Ol 2.06 50 <.01 —2.08 50 <.01 
School Mean Minority 2.65 51 <1 2.62 51 <0 2.52 51 <0l1 
School Mean SES 3.92. 39 £01 3.94 39 <0] 3.93 39 <Q] 
Private 2.11 .32 <Q] 2.10 32 <0] 2.10 .32 <0] 
Size 0.07 .02 <0] 0.06 .02 <.01 0.06 .02 <.¢Q1 
Female x Size -0.07 .03 <1 -0.07 03 1 -0.07 .03 Ol 
Minority x Private 2.06 49 <.O1 2.09 48 <.01 2.10 48 <.01 
SES x Private -0.95 .22 <0] -0.95 22 <0] -0.95 .23 <0] 
Female Contextual -1.00 53 .06 -0.98 53 .07 -1.00 53  .06 
Minority Contextual 1.17 .61 .06 1.21 61 05 1.32 .61  .03 
SES Contextual 1.67 .42 <.01 1.68 .42 <.01 1.67 .42 <.01 
Covariance Structure 
Intercept Variance 1.73 1.74 1.74 
Female Slope Variance --- 0.59 0.59 
Minority Slope Variance 0.83 0.81 0.81 
SES Slope Variance --- --- 0.08 
Int, Female Correlation --- —.19 = 25 
Int, Minority Correlation 32 31 34 
Int, SES Correlation --- --- a2, 
Female, Minority Correlation --- —.26 —.19 
Female, SES Correlation --- --- —.07 
Minority, SES Correlation --- --- 09 


Residual Variance 35.42 35.32 35.29 
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Figure 1. Empirical trajectory plots for Treatment Group | (top) and Treatment Group 2 


(bottom) 
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Figure 2. Average fitted trajectories for each treatment group 
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Figure 3. Comparison of path diagrams for random effect covariance matrix specifications for an 
unrestricted structure (top) and a factor analytic structure (bottom) 


